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Abstract 

This paper presents a hierarchical Bayesian model to reconstruct sparse images when the 
observations are obtained from linear transformations and corrupted by an additive white Gaussian 
noise. Our hierarchical Bayes model is well suited to such naturally sparse image applications as 
it seamlessly accounts for properties such as sparsity and positivity of the image via appropriate 
Bayes priors. We propose a prior that is based on a weighted mixture of a positive exponential 
distribution and a mass at zero. The prior has hyperparameters that are tuned automatically by 
marginalization over the hierarchical Bayesian model. To overcome the complexity of the posterior 
distribution, a Gibbs samphng strategy is proposed. The Gibbs samples can be used to estimate 
the image to be recovered, e.g. by maximizing the estimated posterior distribution. In our fully 
Bayesian approach the posteriors of all the parameters are available. Thus our algorithm provides 
more information than other previously proposed sparse reconstruction methods that only give a 
point estimate. The performance of the proposed hierarchical Bayesian sparse reconstruction method 
is illustrated on synthetic data and real data collected from a tobacco virus sample using a prototype 
MRFM instrument. 

Index Terms 

Deconvolution, MRFM imaging, sparse representation, Bayesian inference, MCMC methods. 
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I. Introduction 

For several decades, image deconvolution has been of increasing interest [2], [47]. Image 
deconvolution is a method for reconstructing images from observations provided by optical 
or other devices and may include denoising, deblurring or restoration. The applications 
are numerous including astronomy [49], medical imagery [48], remote sensing [41] and 
photography [55]. More recently, a new imaging technology, called Magnetic Resonance 
Force Microscopy (MRFM), has been developed (see [38] and [29] for reviews). This non- 
destructive method allows one to improve the detection sensitivity of standard magnetic 
resonance imaging (MRI) [46]. Three dimensional MRI at 4nm spatial resolution has recently 
been achieved by the IBM MRFM prototype for imaging the proton density of a tobacco 
virus [8]. Because of its potential atomic-level resolution^ the 2-dimensional or 3-dimensional 
images resulting from this technology are naturally sparse in the standard pixel basis. Indeed, 
as the observed objects are molecules, most of the image is empty space. In this paper, a 
hierarchical Bayesian model is proposed to perform reconstruction of such images. 

Sparse signal and image deconvolution has motivated research in many scientific applica- 
tions including: spectral analysis in astronomy [4]; seismic signal analysis in geophysics [7], 
[45]; and deconvolution of ultrasonic B-scans [39]. We propose here a hierarchical Bayesian 
model that is based on selecting an appropriate prior distribution for the unknown image 
and other unknown parameters. The image prior is composed of a weighted mixture of a 
standard exponential distribution and a mass at zero. When the non-zero part of this prior 
is chosen to be a centered normal distribution, this prior reduces to a Bernoulli-Gaussian 
process. This distribution has been widely used in the literature to build Bayesian estimators 
for sparse deconvolution problems (see [5], [16], [24], [28], [33] or more recently [3] and 
[17]). However, choosing a distribution with heavier tail may improve the sparsity inducement 
of the prior. Combining a Laplacian distribution with an atom at zero results in the so-called 
LAZE prior. This distribution has been used in [27] to solve a general denoising problem in 
a non-Bayesian quasi-maximum likelihood estimation framework. In [52], [54], this prior has 
also been used for sparse reconstruction of noisy images, including MRFM. The principal 
weakness of these previous approaches is the sensitivity to hyperparameters that determine 
the prior distribution, e.g. the LAZE mixture coefficient and the weighting of the prior vs the 

'Note that the current state of art of the MRFM technology allows one to acquire images with nanoscale resolution. 
However, atomic-level resolution might be obtained in the future. 
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likelihood function. The hierarchical Bayesian approach proposed in this paper circumvents 
these difficulties. Specifically, a new prior composed of a mass at zero and a single-sided 
exponential distribution is introduced, which accounts for positivity and sparsity of the pixels 
in the image. Conjugate priors on the hyperparameters of the image prior are introduced. It 
is this step that makes our approach hierarchical Bayesian. The full Bayesian posterior can 
then be derived from samples generated by Markov chain Monte Carlo (MCMC) methods 
[44]. 

The estimation of hyperparameters involved in the prior distribution described above is the 
most difficult task and poor estimation leads to instability. Empirical Bayes (EB) and Stein 
unbiased risk (SURE) solutions were proposed in [52], [54] to deal with this issue. However, 
instability was observed especially at higher signal-to-noise ratios (SNR). In the Bayesian 
estimation framework, two approaches are available to estimate these hyperparameters. One 
approach couples MCMC methods to an expectation-maximization (EM) algorithm or to a 
stochastic EM algorithm [30], [32] to maximize a penalized likelihood function. The second 
approach defines non-informative prior distributions for the hyperparameters; introducing a 
second level of hierarchy into the Bayesian formulation. This latter fully Bayesian approach, 
adopted in this paper, has been successfully applied to signal segmentation [11], [14], [15] 
and semi-supervised unmixing of hyperspectral imagery [13]. 

Only a few papers have been published on reconstruction of images from MRFM data [6], 
[8], [56], [58]. In [21], several techniques based on linear filtering and maximum-likelihood 
principles were proposed that do not exploit image sparsity. More recently, Ting et al. has 
introduced sparsity penalized reconstruction methods for MRFM (see [54] or [53]). The 
reconstruction problem has been formulated as a decomposition into a deconvolution step and 
a denoising step, yielding an iterative thresholding framework. In [54] the hyperparameters are 
estimated using penalized log-likelihood criteria including the SURE approach [50]. Despite 
promising results, especially at low SNR, penalized likelihood approaches require iterative 
maximization algorithms that are often slow to converge and can get stuck on local maxima 
[10]. In contrast to [54], the fully Bayesian approach presented in this paper converges 
quickly and produces estimates of the entire posterior and not just the local maxima. Indeed, 
the hierarchical Bayesian formulation proposed here generates Bayes-optimal estimates of all 
image parameters, including the hyperparameters. 

In this paper, the response of the MRFM imaging device is assumed to be known. While 
it may be possible to extend our methods to unknown point spread functions, e.g., along 
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the lines of [22], [23], the case of sparse blind deconvolution is outside of the scope of this 
paper. 

This paper is organized as follows. The deconvolution problem is formulated in Section II. 
The hierarchical Bayesian model is described in Section EI. Section IV presents a Gibbs 
sampler that allows one to generate samples distributed according to the posterior of interest. 
Simulation results, including extensive performance comparison, are presented in Section V. 

In Section VI we apply our hierarchical Bayesian method to reconstruction of a tobacco virus 
from real MRFM data. Our main conclusions are reported in Section VII. 

II. Problem formulation 

Let X denote a Zi x . . . x Z„ unknown n-dimensional pixelated image to be recovered (e.g. 
n — 2 or n — 3). Observed are a collection of P projections y = [yi, . . . , yp]^ which are 
assumed to follow the model: 

y = T(K,X) + n, (1) 

where T" (•, •) stands for a bilinear function, n is a P x 1 dimension noise vector and k is the 
kernel that characterizes the response of the imaging device. In the right-hand side of (1), n 
is an additive Gaussian noise sequence distributed according to n ~ A/" (0, cr^Ip), where the 
variance is assumed to be unknown. 

Note that in standard deblurring problems, the function T (•, •) represents the standard n- 
dimensional convolution operator (g). In this case, the image X can be vectorized yielding 
the unknown image x e with M — P — I1I2 . . .In- With this notation, Eq. (1) can be 
rewritten: 

y = Hx + n or Y = At (g) X + N (2) 

where y (resp. n) stands for the vectorized version of Y (resp. N) and H is an P x M 
matrix that describes convolution by the point spread function (psf) k. 

The problem addressed in the following sections consists of estimating x and under 
sparsity and positivity constraints on x given the observations y, the psf k and the bilinear 
function^ T (•,•). 

^In the following, for sake of conciseness, the same notation T {■,■) will be adopted for the bilinear operations used on 
n-dimensional images X and used on M x 1 vectorized images x. 
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III. Hierarchical Bayesian model 

A. Likelihood function 

The observation model defined in (1) and the Gaussian properties of the noise sequence 
n yield: 

^"Pl^ 2^^ J' 

where ||-|| denotes the standard £2 norm: ||x||^ = x^x. 

B. Parameter prior distributions 

The unknown parameter vector associated with the observation model defined in (1) is 
6 = {x, (T^}. In this section, we introduce prior distributions for these two parameters; which 
are assumed to be independent. 

1 ) Image prior: First consider the exponential distribution with shape parameter a > 0: 

9a (xi) = - exp f-— ) Ir* {xi) , (4) 



a \ a 

where 1e (x) is the indicator function defined on E: 

f 1, ifxeE, 
l^ix)^{ (5) 
I 0, otherwise. 

Choosing ga{-) as prior distributions for (i = 1, . . . , M) leads to a maximum a posteriori 
(MAP) estimator of x that corresponds to a maximum £1 -penalized likelihood estimate with a 
positivity constraint^. Indeed, assuming the component Xi (i — 1, . . . , P) a priori independent 

allows one to write the full prior distribution for x = [xi, . . . ,xm]^'- 

1 \ M / II I 
1 \ / X 



^" " \a) V ^) ^^""^^^ ' 

where {x ^ 0} = {x G R^-^; > 0, Vi = 1, . . . , M] and ||-||^ is the standard ti norm ||x|| -, — 
^Jxil. This estimator has interesting sparseness properties for Bayesian estimation [1] and 
signal representation [20]. 

Coupling a standard probability density function (pdf) with an atom at zero is another 
alternative to encourage sparsity. This strategy has for instance been used for located event 
detection [28] such as spike train deconvolution [5], [7]. In order to increase the sparsity of 

^Note that a similar estimator using a Laplacian prior for Xi (i = 1, . . . , M) was proposed in [51] for regression problems, 
referred to as the LASSO estimator, but without positivity constraint. 
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the prior, we propose to use the following distribution derived from ga{-) as prior distribution 
for Xi'. 

f {Xi\w, a) = (1 - W)S (Xi) + WQa {Xi) , (7) 

where 5 {■) is the Dirac function. This prior is similar to the LAZE distribution (Laplacian 
pdf and an atom at zero) introduced in [27] and used, for example, in [52], [54] for MRFM. 
However, since ga{xi) is zero for Xi < 0, the proposed prior in (7) accounts for the positivity 
of the non-zero pixel values, a constraint that exists in many imaging modalities such as 
MRFM. By assuming the components Xi to be a priori independent (i — 1,...,M), the 
following prior distribution for x is obtained: 



M 



f (x|w, a) = [(1 - W)6 (Xi) + WQa {Xi)] . (8) 



i=l 



n 9a {Xi) 



(9) 



Introducing the index subsets Jo = {i; Xi = 0} and Ji = Jq = {%] Xi ^ 0} allows one to 
rewrite the previous equation as follows: 

{i-wril5{x,) 

with — card{Jc}, e G {0, 1}. Note that Uq — M — rii and rii — ||x||q where is the 
standard io norm ||x||q = # {i; Xi ^ 0}. 

2) Noise variance prior: A conjugate inverse-Gamma distribution with parameters | and 
I is chosen as prior distribution for the noise variance [43, Appendix A]: 

a>,7-^X6^(^,|). (10) 

In the following, the shape parameter v will be fixed Xo v — 2 and the scale parameter 
7 will be estimated as an hyperparameter (see [13], [14], [40]). Note that choosing the 
inverse-Gamma distribution XQ (f , ^) as a prior for cr^ is equivalent to choosing a Gamma 
distribution ^ (|, |) as a prior for l/cr^. 

C. Hyperparameter priors 

The hyperparameter vector associated with the aforementioned prior distributions is $ = 
{a,7, w}. Obviously, the accuracy of the proposed Bayesian model depends on the values 
of these hyperparameters. Sometimes prior knowledge may be available, e.g., the mean 
number of non-zero pixels in the image. In this case these parameters can be tuned manually 
to their true values. However, in many practical situations such prior information is not 
available and these hyperparameters must be estimated directly from the data. Priors for 
these hyperparameters, sometimes referred to as "hyperpriors" are given below. 
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1) Hyperparameter a: A conjugate inverse-Gamma distribution is assumed for the scale 
parameter a of the distribution Qa (•) of non-zero pixel intensities: 

a\ot r^IQ {ao.aij , (11) 

with a = [cKo, CKi]^. Similarly to [19], the fixed hyperparameters and cti have been chosen 
to obtain a vague prior: ckq = ai = 10~^°. 

2) Hyperparameter 7; A non informative Jeffreys' prior [25], [26] is assumed for the 
scale parameter of the inverse Gamma prior density on the noise variance cr^ : 

/ (7) oc (7) . (12) 

The combination of the priors (10) and (12) yields the non-informative Jeffreys' prior on 
cr^. Note that there is no difference between choosing a non-informative Jeffrey's prior for 
cr^ and the proper hierarchical prior defined by (10) and (12). Indeed, integrating over the 
hyperparameter 7 in the joint / (cr^, 7) distribution yields: 

/(^') = I f[Al)f{l)dl 

' ^7 (13) 



1 

oc — . 

However, in more complex noise models the hierarchical priors / {<7'^\^') and / (7) are not 
equivalent to such a simple prior on cr^. For example, as in [12], this pair of hierarchical 
priors is easily generalizable to conditionally Gaussian noise with spatial correlation and 
spatially varying signal-to-noise ratio. 

3) Hyperparameter w: A uniform distribution on the simplex [0, 1] has been chosen as 
prior distribution for the mean proportion of non-zero pixels: 

Wr^U{[Q,l\). (14) 

This is the least informative prior on the image sparsity factor. Assuming that the individual 
hyperparameters are statistically independent the full hyperparameter prior distribution for $ 
can be expressed as: 

/(#|a) = /(«;)/ (7) /(a) 
1 / 



X l[o,i] {w) 1]R+ (a) 1]K+ (7) , 
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Fig. 1. DAG for the parameter priors and hyperpriors (the fixed non-random hyperparameters appear in dashed boxes). 



D. Posterior distribution 

The posterior distribution of {0, ^} can be computed as follows: 

/ (0, $|y, a) cx / (y|0) / / ($|a) , (16) 

with 

/(0|*) = /(x|a,^)/(a^|7), (17) 

where / {y\d) and / {*^\ol) have been defined in (3) and (15). This hierarchical structure, 

represented on the directed acyclic graph (DAG) in Fig. 1, allows one to integrate out the 

parameter cr^ and the hyperparameter vector $ in (16) to obtain the posterior of the image 

given the measured data and the parameters x: 

5(l + ni,l + no) 
/ (^1^' °^ 1 7^1 ^ 

y — T (K, x) 

r (ni + ao) ^ 

In (18), as defined in paragraph III-B.l, ni = ||x||q, uq = M — ||x||q and B (■, ■) stands for 
the Beta function B (u, v) =T (u) T (v) /T {u + v), where r(-) denotes the Gamma function. 

The next section presents an appropriate Gibbs sampling strategy [44, Chap. 10] that allows 
one to generate an image sample distributed according to the posterior distribution / (x|y, a). 
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IV. A GiBBS SAMPLING STRATEGY 
FOR SPARSE IMAGE RECONSTRUCTION 

In this section, we describe the Gibbs sampling strategy for generating samples {x^*^}^^^ 
distributed according to the posterior distribution in (18). As simulating directly according 
to (18) is difficult, it is much more convenient to generate samples distributed according to 
the joint posterior / (x, cr^, a, w|y, a). This Gibbs sampler produces sequences {x*^*^}^^^^ , 
|(j2(t)j^^^ 1^(4) 1^^^ , {^'-*'*}^^^ which are Markov chains with stationary distributions 
/ (x|y, ct), f ((7^|y, a), / (a|y, a) and / {w\y, a), respectively [44, p. 345]. Then, the MAP 
estimator of the unknown image x will be computed by retaining among X — {x^*^}^^^ 
the generated sample that maximizes the posterior distribution in (18) [35, p. 165]: 

Xmap = argmax/ (x|y) 

--f (19) 
fti argmax / (x|y) . 

The main steps of this algorithm are given in subsections IV-A and IV-D (see also Algorithm 1 
below). 



Algorithm 1: 
Gibbs sampling algorithm for sparse image reconstruction 

• Initialization: 

- Sample parameter x^"' from the pdf in (9), 

- Sample parameter from the pdf in (10), 

- Set t ^ 1, 

• Iterations: for t = 1, 2, . . . , do 

1. Sample hyperparameter w^*' from the pdf in (21), 

2. Sample hyperparameter a^*^ from the pdf in (22), 

3. For z = 1, . . . , M, sample parameter xf^ from the pdf in (23), 

4. Sample parameter ct2(*) from the pdf in (26), 

5. Sett^t + 1. 
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A. Generation of samples according to / (w |x) 
Using (9), the following result can be obtained: 

/(w|x) oc (l-u))"»w"S (20) 

where uq and ni have been defined in paragraph III-B.l. Therefore, samples from f {w\x.) 
can be generated by simulating from an image dependent Beta distribution: 

w |x ~ ,Be(l + ni, 1 + no) . (21) 

B. Generation of samples according to / (a |x, a) 

The form of the joint posterior distribution (16) implies that samples of a can be generated 
by simulating from an image dependent inverse-Gamma distribution: 

a |x, a ~ IQ (||x||o + cto, ||x||^ + ai) . (22) 

C. Generation of samples according to / (x [w, a, u^, y ) 

The LAZE-type prior (7) chosen for (i = 1, . . . , M) yields a posterior distribution of x 
that is not closed form. However, one can easily derive the posterior distribution of each pixel 
intensity Xi (i — 1, . . . ,M) conditioned on the intensities of the rest of the image. Indeed 
straightforward computations (Appendix I) yield: 

/ (xi\w, a, cr^ x_i, y) oc (1 - Wi)6 (xi) 

(23) 

+ Wi(j)+ {xi\ni,r]J) , 

where x_j stands for the vector x whose ith component has been removed and jii and iff 
are given in Appendix I. In (23), 0+ (•, m, s^) stands for the pdf of the truncated Gaussian 
distribution defined on R;^ with hidden mean and variance parameters equal to m and s^, 
respectively: 

liRl {x) , (24) 



with 

The form in (23) specifies Xjjty, a, cr^, x_j, y as a BemoulU-truncated Gaussian variable with 
parameter [wi, jii^ifl). Appendix III presents an algorithm that can be used to generate 
samples from this distribution. This algorithm generates samples distributed according to 
/ (x |w, cr^, a, y ) by successively updating the coordinates of x using a sequence of M Gibbs 
moves (requiring generation of Bernoulli-truncated Gaussian variables). 
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(x, m, S^) = 



C (m, s2) 



exp 



(X 



my 



2s2 



erf 



m 



(25) 



D. Generation of samples according to f (cr^ |x, y) 
Samples are generated in the following way: 

a \^,y^ig\^-, j. 



V. Simulation on synthetic images 

TABLE I 

Parameters used to compute the MRFM psf. 



Parameter 


Value 


Description 


Name 


Amplitude of external magnetic field 


-Bext 


9.4 X 10^ 


G 


Value of Bmag in the resonant slice 




1.0 X 10* 


G 


Radius of tip 


Ro 


4.0 nm 




Distance from tip to sample 


d 


6.0 nm 




Cantilever tip moment 


m 


4.6 X 10^ emu 


Peak cantilever oscillation 




0.8 nm 




Maximum magnetic field gradient 




125 






Fig. 2. Left: Psf of the MRFM tip. Right: unknown sparse image to be estimated. 



September 23, 2009 



12 



A. Reconstruction of 2-dimensional image 

In this subsection, a 32 x 32 synthetic image, depicted in Fig. 2 (right panel), is simulated 
using the prior in (9) with parameters a — 1 and w — 0.02. In Figs. 2 and 3, white pixels 
stands for zero intensity values. A general analytical derivation of the psf of the MRFM 
tip has been given in [34] and with further explanation in [54]. Following this model, we 
defined a 10 x 10 2-dimensional convolution kernel, the psf represented in Fig. 2 (left panel), 
that corresponds to the physical parameters shown in Table I. The associated psf matrix 
H introduced in (2) is of size 1024 x 1024. The observed measurements y, which are of 
size P — 1024 and depicted in Fig. 3 (top panel), are corrupted by an additive Gaussian 
noise with two different variances cr^ = 1.2 x 10~^ and cr^ = 1.6 x 10~^, corresponding to 
signal-to-noise ratios SNR = 2dB and SNR = 20dB, respectively. 

1) Simulation results: The observations are processed by the proposed algorithm using 
Nuc = 2000 iterations of the Gibbs sampler with A^'bi = 300 burn-in iterations. The compu- 
tation time for completing 100 iterations of the proposed algorithm is 80s for an unoptimized 
MATLAB 2007b 32bit implementation on a 2.2GHz Intel Core 2, while 100 iterations of the 
Landweber and empirical Bayesian algorithms require 0.15s and 2s, respectively. The MAP 
image reconstruction computed using (19) is depicted in Fig. 3 (bottom panel) for the two 
levels of noise considered. Observe that the estimated image is very similar to the actual 
image. Fig. 2 (right panel), even at low SNR. 

Moreover, as the proposed algorithm generates samples distributed according to the poste- 
rior distribution in (18), these samples can be used to compute the posterior distributions of 
each parameter. For illustration, the posterior distributions of the hyperparameters a and w, 
as well as the noise variance cr^, are shown in Fig. 4, 5 and 6. These estimated distributions 
are in good agreement with the ground truth values of these parameters, randomly drawn 
from the prior distribution. 

In many applications, a measure of confidence that a given pixel or pixel region is non-zero 
is of interest. Our Bayesian approach can easily generate such measures of confidence in the 
form of posterior probabilities of the specified event, sometimes known as the Bayesian p- 
value. Following the strategy detailed in Appendix EI, the proposed Gibbs sampler generates a 
collection of samples {x'^*^}^^^ ^ , distributed according the posterior Bernoulli-truncated 
Gaussian distribution in (23). This sampling requires the generation of indicator variables Zi 
(i = 1, . . . ,n) that reflect the presence or the absence of non-zero pixel values. It is the 
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Fig. 3. Top, left (resp. right): noisy observations for SNR — 2dB (resp. 20dB). Bottom, left (resp. right): reconstructed 
image for SNR = 2dB (resp. 20dB). 




Fig. 4. Posterior distribution of hyperparameter a (left: SNR = 2dB, right: SNR = 20dB). 
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Fig. 5. Posterior distribution of liyperparameter w (left: SNR — 2dB, right: SNR — 20dB). 




0.15 




x10 



Fig. 6. Posterior distribution of hyperparameter (left: SNR = 2dB, right: SNR = 20dB). 



indicator variable Zi that Xi > that provides information about non-zero pixels in the 
image. Using the equivalences {zi = 0} ^ {xi = 0} and {zi = 1} <S=^ {xi > 0}, the posterior 
probability P [xi > 0|y, a] can be easily obtained by averaging over the Gibbs samples of the 
binary variables < zf^ \ . To illustrate, these probabilities are depicted in Fig. 7. 

I J i=Afbi+i,.-,AfMC 

In addition, these Gibbs samples can be used to compute the probability of having non-zero 
pixels in a given area of the image. The estimated posterior probability for the event that a 
non-zero pixel is present inside the small red rectangle in the figure is equal to 45% for the 
case of SNR = 2dB. Conversely, the posterior probability of having a non-zero pixel in the 
green box is 5%. For SNR = 20dB the MAP algorithm correctly detects up the presence of a 
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pixel in this region. On the other hand, even though at SNR = 2dB the MAP reconstruction 
has not detected this pixel, we can be 45% confident of the presence of such a pixel in the 
red rectangular region on the left panel of Fig. (7). 




5 10 15 20 25 30 



5 
10 
15 
20 
25 
30 



10 



15 20 25 30 



Fig. 7. Posterior probabilities of having non-zero pixels (left: SNR = 2dB, right: SNR — 20dB). The probability of 
having at least one non-zero pixel in the red (resp. green) box-delimited area is 45% (resp. 5%). 

The posterior distributions of four different pixels are depicted in Fig. 8. These posteriors 
are consistent with the actual values of these pixels that are represented as dotted red lines 
in these figures. In particular, in all cases the actual values all lie within the 75% central 
quantile of the posterior distribution. 

2) Comparison of reconstruction performances: Here we compare our proposed hierar- 
chical Bayesian method to the sparse reconstruction methods of [52], [54]. The techniques 
proposed in [52], [54] are based on penalized likelihood EM algorithms that perform empirical 
estimation of the unknown hyperparameters. Therein, two empirical Bayesian estimators, 
denoted Emp-MAP-Lap and Emp-MAP-LAZE, based on a Laplacian or a LAZE prior 
respectively, were proposed. We also compare to the standard Landweber algorithm [31] 
that has been previously used to perform MRFM image reconstruction [8], [57]. These are 
compared to our hierarchical Bayesian MAP reconstruction algorithm, given in (19), and 
also to a minimum mean square error (MMSE) reconstruction algorithm extracted from the 
estimated full Bayes posterior (18). The MMSE estimator of the image x is obtained by 
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Fig. 8. Posterior distributions of thie non-zero values of x for 4 different pixel locations and for SNR — 20dB (actual 
pixel intensity values are depicted with dotted red lines). 



empirical averaging over the last A^^ = 1700 samples of the Gibbs sampler according to: 

Xmmse = E [x|y] 



1 ^ - . (27) 



' t=\ 



As in [54] we compare the various reconstruction algorithms with respect to several 
performance criteria. Let e = x — x denote the reconstruction error when x is the estimator of 
the image x to be recovered. These criteria are: the £oj and £2-norms of e, which measures 
the accuracy of the reconstruction, and the £o-norm of the estimator x, which measures its 
sparsity. As pointed out in [54], a human observer can usually not visually detect the presence 
of non-zero intensities if they are below a small threshold. Thus, a less strict measure'^ of 

''The introduced measure of sparsity is denoted |H|^. This is an abuse of notation since it is not a norm. 
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sparsity than the io-novm, which is denoted H-H^, is the number of reconstructed image pixels 
that are less than a given threshold 6: 

M 

(28) 

i=\ 

It what follows, b has been chosen as 5 = 10^^ ll^lloo- "^'^ summarize, the following criteria 
have been computed for the image in paragraph V-A.l for two levels of SNR: ||e||Q, ||e||^, 
||e||2, ||x||o and HxH^. 

Table II shows the six performance measures for the five different algorithms studied. The 
proposed Bayesian methods (labeled "proposed MMSE" and "proposed MAP" in the table) 
outperform the other reconstruction algorithms in terms of i\ or £2-norms. Note that the 
MMSE estimation of the unknown image is a non sparse estimator in the £o-norm sense. 
This is due to the very small but non-zero posterior probability of non-zero value at many 
pixels. The sparsity measure ||-||_5 indicates that most of the pixels are in fact very close 
to zero. The MAP reconstruction method seems to achieve the best balance between the 
sparsity of the solution and the minimization of the reconstruction error. Of course, by its 
very construction, the MMSE reconstruction will always have lower mean square error. 

B. Reconstruction of undersampled 3 -dimensional images 

As discussed below in Section VI, the prototype IBM MRFM instrument [8] collects data 
projections as irregularly spaced, or undersampled, spatial samples. In this subsection, we 
indicate how the image reconstruction algorithm can be adapted to this undersampled scenario 
in 3D. We illustrate by a concrete example. First, a 24 x 24 x 6 image is generated such that 4 
pixels have non-zero values in each z slice. The resulting data is depicted in Fig. 9 (top) and 
Fig. 10 (left). This image to be recovered is assumed to be convolved with a 5 x 5 x 3 kernel 
that is represented in Fig. 10 (right). The resulting convolved image is depicted in Fig. 11 
(left). However, the actual observed image is an undersampled version of this image. More 
precisely, the sampling rates are assumed to be dx — 2, dy — 3, d^ — 1, respectively, in the 
3 dimensions. Consequently the observed 3D image, shown in Fig. 11, is of size 12 x 8 x 6. 
Finally, an i.i.d. Gaussian noise with a = 0.02 is added following the model in (1). Note 
that under these assumptions, the application T (•, •) can be spht into two standard operations 
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TABLE n 

Reconstruction performances for different sparse image reconstruction algorithms. 



Method 


Error criterion 


Hello 


l|e||. 


l|e|k 


M, 


l|A|lo 


11*11. 






SNR = 


2dB 








Landweber 


1024 


990 


339.76 


13.32 


1024 


990 


Emp-MAP-Lap 


18 


17 


14.13 


4.40 








Emp-MAP-LAZE 


60 


58 


9.49 


1.44 


55 


55 


Proposed MMSE 


1001 


30 


3.84 


0.72 


1001 


27 


Proposed MAP 


19 


16 


2.38 


0.81 


13 


13 






SNR = 


20dB 








Landweber 


1024 


931 


168.85 


6.67 


1024 


931 


Emp-MAP-Lap 


33 


18 


1.27 


0.31 


28 


23 


Emp-MAP-LAZE 


144 


19 


1.68 


0.22 


144 


27 


Proposed MMSE 


541 


5 


0.36 


0.11 


541 


16 


Proposed MAP 


19 


7 


0.39 


0.13 


16 


16 



following the composition: 

r(/c,X)=^d^,d„d.(K(8)X), (29) 

where gd^,dy,dz (■) stands for the undersampling function. 

The proposed hierarchical Bayesian algorithm is used to perform the sparse reconstruction 
with undersampled data. The number of Monte Carlo runs was fixed to A^mc = 2000 with 
A^bi — 300 bum-in iterations. Figure 9 shows the result of applying the proposed MAP 
estimator to the estimated posterior. 

VI. Application on real MRFM images 

Here we illustrate the hierarchical Bayesian MAP reconstruction algorithm for real three 
dimensional MRFM data. The data is a set of MRFM projections of a sample of tobacco 
virus. Comprehensive details of both the experiment and the MRFM data acquisition protocol 
are given in [8] and the supplementary materials [9]. The observed sample consists of a 
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z= 1 



z = 2 



z = 3 



z = 4 



z = 5 



z = 6 





Fig. 9. Top: slices of the sparse image to be recovered. Bottom: slices of the estimated sparse image. 




Fig. 10. Left: 24 x 24 x 6 unknown image to be recovered. Right: 5x5x3 kernel modeling the psf. 



collection of Tobacco mosaic virus particles that constitute a whole viral segment in addition 
to viral fragments. The projections are computed from the measured proton distribution and 
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X 



Fig. 11. Left: 24 x 24 x 6 regularly sampled convolved image. Left: 12 x 8 x 6 undersampled observed image. 

the 3-dimensional psf following the protocol described in [8] and [9]. The resulting scan data 
are depicted in Figure 12 (top) for four different distances between the MRFM tip and the 
sample: d = 24nm, d = 37nm, d = 50nm and d = 62nm. Each of these x-y slices is of size 
60 X 32 pixels. 

These experimental data are undersampled, i.e. the spatial resolution of the MRFM tip, and 
therefore the psf function, is finer than the resolution of the observed slices. Consequently, 
these data have been deconvolved taking into account the oversampling rates defined by 
dx = 'i, dy = 2 and dz = S in the three directions. The MAP estimate of the unknown image 
is computed from A^mc = 1000 Gibbs samples of the proposed Bayesian algorithm initialized 
with the output of a single Landweber iteration. Several more iterations of the Landweber 
algorithm would produce the reconstructions reported in [8]. Three horizontal slices of the 
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Fig. 12. Top: experimental scan data where black (resp. white) pixel represents low (resp. high) density of spin (as in 
[8]). Middle: scan data reconstructed from the proposed hierarchical Bayesian algorithm. Bottom: scan data reconstructed 
from the Landweber algorithm. 



estimated image^ are depicted in Figure 13. A 3-dimensional view of the estimated profile of 
the virus fragments is shown in Figure 14. The MMSE estimates of the parameters introduced 
in Section ni are (Tmmse — 0-10' ^mmse = 1-9 x 10~^^ and Wmmse — 1-4 x 10~^. The image 
reconstructions produced by the Landweber and Bayesian MAP algorithms are shown in Fig. 
12. 

By forward projecting the estimated virus image through the point spread function one 
can visually evaluate the goodness of fit of the reconstruction to the raw measured data. This 
is depicted in Fig. 12. These figures are clearly in good agreement with the observed data 
(top). To evaluate the convergence speed, the reconstruction error is represented in Figure 15 

^Note that most part of the estimated 3 dimensional image is empty space due to the very localized proton spin centers 
in the image. 
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Fig. 13. Three horizontal slices of the estimated image. 




Fig. 14. 3-dimensional view of the estimated profile of the Tobacco virus fragments. 



as a function of the iterations for the proposed Bayesian and the Landweber algorithms. This 
shows that the convergence rate of our algorithm is significantly better than the Landweber 
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250 




Iterations 

Fig. 15. Error of the reconstructions as functions of the iteration number for the proposed algorithm (continuous blue line) 
and Landweber algorithm (dotted red line). 

VII. Conclusions 

This paper presented a hierarchical Bayesian algorithm for deconvolving sparse positive 
images corrupted by additive Gaussian noise. A Bernoulli-truncated exponential distribution 
was proposed as a prior for the sparse image to be recovered. The unknown hyperparameters 
of the model were integrated out of the posterior distribution of the image, producing a 
full posterior distribution that can be used for estimation of the pixel values by extracting 
the mode (MAP) or the first moment (MMSE). An efficient Gibbs sampler was used to 
generate approximations to these estimates. The derived Bayesian estimators significantly 
outperformed several previously proposed sparse reconstruction algorithms. Our approach 
was implemented on real MRFM data to reconstruct a 3D image of a tobacco virus. Future 
work will include extension of the proposed method to other sparse bases, inclusion of 
uncertain point spread functions, and investigation of molecular priors. Future investigations 
might also include a comparison between the proposed MCMC approach and variational 
Bayes approaches. 

Appendix I 
Derivation of the conditional 
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POSTERIOR DISTRIBUTION / {Xi |w, a, cr^, x_j, y ) 

The posterior distribution of each component Xi (i — 1, . . . , M) conditionally upon the 
others is linked to the likelihood function (3) and the prior distribution (7) via the Bayes' 
formula: 

/ {xi\w, a, CT^ x_i, y) oc / (y|x, a^) / {xi\w, a) . (30) 
This distribution can be easily derived by decomposing x on the standard orthonormal basis 

B = {ui,...,um}, (31) 
where is the ith column of the M x M identity matrix. Indeed, let decompose x as follows: 

X = Xj + XiUi, (32) 

where Xj is the vector x whose ith element has been replaced by 0. Then the linear property 
of the operator T (k, ■) allows one to state: 



T{K,x)^T{K,ii)+XiT{K,Ui) 

Consequently, (30) can be rewritten 

f {xi\w,a,a^,x-i,y) oc exp 



(33) 



Gi Xi\li\ 



X 



w 



(1 — w)5 (xi) H — exp 



2a2 



1r1 (xi 



(34) 



where 



(35) 



= y - T (k, Xj) , 
hj = T {k, Ui) . 

An efficient way to compute within the Gibbs sampler scheme is reported in Appendix II. 
Then, straightforward computations similar to those in [7] and [37, Annex B] yield to the 
following distribution: 

/ {xi\w, a, a^, x_j, y) oc (1 - Wi)d (xi) 

+ Wi(l)+ {xi\iii,r]i) , 

with 



(36) 



f^i = Vi 



|2 ' 



2 /'hfei 1 



(37) 



It can be noticed that, for deblurring applications, hi is also tlie ith column of the matrix H introduced in (2). 
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and 2 

Ui = ^C'(/Xi,r/,')exp (1^2 ) > 

« K^TliJ (38) 



Ui 

Wi 



Ui + {1 - w) 

The distribution in (36) is a Bernoulli-truncated Gaussian distribution with hidden mean //j 
and hidden variance rjf. 

Appendix II 
Fast recursive computations 

FOR SIMULATING ACCORDING TO / (x \w, a, a^,y) 

In the Gibbs sampling strategy presented in Section IV, the main computationally expensive 
task is the generation of samples distributed according to / (xj a, ci^, x_i, y ). Indeed, 
the evaluation of the hidden mean and hidden variance in (37) of the Bernoulli-truncated 
Gaussian distribution may be costly, especially when the biUnear application T'(-, ) is not 
easily computable. In this appendix, an appropriate recursive strategy is proposed to accelerate 
the Gibbs sampUng by efficiently updating the coordinate i of the vector x at iteration t of 
the Gibbs sampler. 

Let x*^*''~^) denote the current Monte Carlo state of the unknown vectorized image x 
(z = l,...,M): 



^1 , ■ ■ ■ , •I'i-i, •I'i ) -^i+l ) ■ ■ ■ ) -^M 



(39) 

with, by definition, x(*'°^ = x*^*~^'^^. Updating x*^*''~^^ consists of drawing xf ^ according to 
the Bernoulli-truncated Gaussian distribution / w,a,a'^,x.^l'-~^\y^ in (23) with: 

St) jt) (t-i) (t-iy 



(40) 



The proposed strategy to simulate efficiently according to (23) is based on the following 
property. 

Property: Given the quantity T (k, x*^°)) and the vectors {hj} .^^^ ^, simulating according 
to / w, a, cr^, x^-*'', can be performed without evaluating the bilinear function T (•, •). 

Proof: Simulating according to (23) mainly requires to compute the vector introduced 
by (35): 

e, = y-T(K,±f''-'A, (41) 
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with 



~{t,i-i) 
x; 



^1 , ■ ■ ■ , u, Xj^^ , . . . , 



(42) 



Moreover, by using the decomposition in (32) and by exploiting the linear property of T (k, ■), 
the vector T ^K.,xf'*~^''^ in the right-hand side of (41) can be rewritten as: 

T (/c, xf '^-^)) = T (/c, x(*'-i)) - xf-'\, (43) 

where hj has been introduced in (35). Consequently, to prove the property, we have to 
demonstrate that the vector series {T (k, x^*'*^)) }^^^ ^ can be computed recursively without 
using T {■,•). Assume that T (/t, x^*'*"^)) is available at this stage of the Gibbs sampling and 
that xf^ has been drawn. The new Monte Carlo state is then: 

T 



Similarly to (43), the vector T (At,x(*''') can be decomposed as follows: 

r(/c,x(*''))=r(/c,xf'^-^))+xfh,. 

Therefore, combining (43) and (45) allow one to state: 



(44) 



(45) 



T(K,x(*'^))=T(K,x(*'^-i)) + (a;S 



(t-i) 



)hi. 



The bilinear function T (-, ■) only needs to be used at the very beginning of the Gibbs sam- 
pling algorithm to evaluate T (k, x^*^)) and the vectors {hj} .^^ ^. The resulting simulation 
scheme corresponding to step 3 of Algorithm 1 is shown in Algorithm 2. 



Appendix III 
Simulation according to a 
Bernoulli-truncated Gaussian distribution 

This appendix describes how we generate random variables distributed according to a 
Bernoulli-truncated Gaussian distribution with parameters {wi, iii^rjj) whose pdf is: 

/ {xi\wi, Hi, rjf) = (1 - Wi) 6{xi) 

{Xi - Hif 



+ 



CM) 



exp 



1r* i^i) 
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Algorithm 2: 



Efficient simulation according to / (x | w, a, , y ) 



For i = 1, . . . ,M, update the ith coordinate of the vector 

^(t.'-i) - 



(f) (t) (f-i) (t-1) (t-1) 

•^1 5 • • • 5 ■''i-l) -^i ) ■''i+l ) • • • ) •''M 



via the following steps: 

1. compute ||hi||^, 

2. set T (k, if '*"'^) = T {k, x(*.*-i)) - xf-'^h^, 

3. setej =x-r(/«,if'*"^^), 

4. compute /Xj, ?7? and as defined in (37) and (38), 

5. draw xf^ according to (23), 



6. setx(*'') 



„(*) 



(f) (t) (t-1) 



(t-1) 



7. set T (m, x(*'*)) = T (k, if 



xf^h,. 



Algorithm 3: 

Simulation according to 
a Bernoulli-truncated Gaussian distribution 



1. generate Zi according to Zi ~ Ber (wi), 
Xi = 0, if Zi = 0; 

Xi N+ {ni,rif) , if 2i = 1. 



2. set 



where €{111,7]}) has been defined in (25). Monte Carlo draws from this density can be 
obtained by using an auxiliary binary variable Zi following the strategy shown in Algorithm 3. 
This indicator variable takes the value (resp. 1) if the pixel Xi is zero (resp. non-zero). 

In Algorithm 3, Ber (•) and A/""*" (•, •) denote the Bernoulli and the positive truncated 
Gaussian distributions respectively. In step 2, samples distributed according to the truncated 
Gaussian distribution can be generated by using an appropriate accept-reject procedure with 
instrumental distributions [18], [36], [42]. 
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